Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Front Immunol ; 15: 1293706, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38646540

RESUMO

Major histocompatibility complex Class II (MHCII) proteins initiate and regulate immune responses by presentation of antigenic peptides to CD4+ T-cells and self-restriction. The interactions between MHCII and peptides determine the specificity of the immune response and are crucial in immunotherapy and cancer vaccine design. With the ever-increasing amount of MHCII-peptide binding data available, many computational approaches have been developed for MHCII-peptide interaction prediction over the last decade. There is thus an urgent need to provide an up-to-date overview and assessment of these newly developed computational methods. To benchmark the prediction performance of these methods, we constructed an independent dataset containing binding and non-binding peptides to 20 human MHCII protein allotypes from the Immune Epitope Database, covering DP, DR and DQ alleles. After collecting 11 known predictors up to January 2022, we evaluated those available through a webserver or standalone packages on this independent dataset. The benchmarking results show that MixMHC2pred and NetMHCIIpan-4.1 achieve the best performance among all predictors. In general, newly developed methods perform better than older ones due to the rapid expansion of data on which they are trained and the development of deep learning algorithms. Our manuscript not only draws a full picture of the state-of-art of MHCII-peptide binding prediction, but also guides researchers in the choice among the different predictors. More importantly, it will inspire biomedical researchers in both academia and industry for the future developments in this field.


Assuntos
Apresentação de Antígeno , Biologia Computacional , Antígenos de Histocompatibilidade Classe II , Peptídeos , Humanos , Antígenos de Histocompatibilidade Classe II/imunologia , Antígenos de Histocompatibilidade Classe II/metabolismo , Peptídeos/imunologia , Biologia Computacional/métodos , Ligação Proteica , Aprendizado Profundo , Algoritmos
2.
Sci Total Environ ; 923: 171377, 2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38458463

RESUMO

Aflatoxin B1 (AFB1) is a major mycotoxin contaminant showing in the environment and foods. In this study, the molecular initiating events (MIEs) of AFB1-induced steatohepatitis were explored in mice and human cell model. We observed dose-dependent steatohepatitis in the AFB1-treated mice, including triglyceride accumulation, fibrotic collagen secretion, enrichment of CD11b + and F4/80+ macrophages/Kupffer cells, cell death, lymphocytes clusters and remarkable atrophy areas. The gut barrier and gut-microbiota were also severely damaged after the AFB1 treatment and pre-conditioned colitis in the experimental mice aggravated the steatohepatitis phenotypes. We found that macrophages cells can be pro-inflammatorily activated to M1-like phenotype by AFB1 through an AHR/TLR4/p-STAT3 (Ser727)-mediated mitochondrial oxidative stress. The phenotypes can be rescued by AHR inhibitors in the mice model and human cell model. We further showed that this signaling axis is based on the cross-talk interaction between AHR and TLR4. Gene knock-up experiment found that the signaling is dependent on AFB1 ligand-binding with AHR, but not protein expressions of TLR4. The signaling elevated NLRP3 and two immune metabolic enzymes ICAM-1 and IDO that are associated with macrophage polarization. Results from intervention experiments with natural anti-oxidant and AHR inhibitor CH223191 suggest that the macrophage polarization may rely on AHR and ROS. Our study provides novel and critical references to the food safety and public health regulation of AFB1.


Assuntos
Aflatoxina B1 , Fígado Gorduroso , Animais , Humanos , Camundongos , Molécula 1 de Adesão Intercelular/metabolismo , Macrófagos/metabolismo , Estresse Oxidativo , Fator de Transcrição STAT3/metabolismo , Receptor 4 Toll-Like/metabolismo
3.
ESC Heart Fail ; 10(5): 3102-3113, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37608687

RESUMO

AIMS: Coronary artery disease (CAD) is the most common cause of heart failure (HF). This study aimed to identify cytokine biomarkers for predicting HF in patients with CAD. METHODS AND RESULTS: Twelve patients with CAD without HF (CAD-non HF), 12 patients with CAD complicated with HF (CAD-HF), and 12 healthy controls were enrolled for Human Cytokine Antibody Array, which were used as the training dataset. Then, differentially expressed cytokines among the different groups were identified, and crucial characteristic proteins related to CAD-HF were screened using a combination of the least absolute shrinkage and selection operator, recursive feature elimination, and random forest methods. A support vector machine (SVM) diagnostic model was constructed based on crucial characteristic proteins, followed by receiver operating characteristic curve analysis. Finally, two validation datasets, GSE20681 and GSE59867, were downloaded to verify the diagnostic performance of the SVM model and expression of crucial proteins, as well as enzyme-linked immunosorbent assay was also used to verify the levels of crucial proteins in blood samples. In total, 12 differentially expressed proteins were overlapped in the three comparison groups, and then four optimal characteristic proteins were identified, including VEGFR2, FLRG, IL-23, and FGF-21. After that, the area under the receiver operating characteristic curve of the constructed SVM classification model for the training dataset was 0.944. The accuracy of the SVM classification model was validated using the GSE20681 and GSE59867 datasets, with area under the receiver operating characteristic curve values of 0.773 and 0.745, respectively. The expression trends of the four crucial proteins in the training dataset were consistent with those in the validation dataset and those determined by enzyme-linked immunosorbent assay. CONCLUSIONS: The combination of VEGFR2, FLRG, IL-23, and FGF-21 can be used as a candidate biomarker for the prediction and prevention of HF in patients with CAD.

4.
J Chem Theory Comput ; 19(12): 3664-3671, 2023 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-37276063

RESUMO

A general limitation of the use of enzymes in biotechnological processes under sometimes nonphysiological conditions is the complex interplay between two key quantities, enzyme activity and stability, where the increase of one is often associated with the decrease of the other. A precise stability-activity trade-off is necessary for the enzymes to be fully functional, but its weight in different protein regions and its dependence on environmental conditions is not yet elucidated. To advance this issue, we used the formalism that we have recently developed to effectively identify stability strength and weakness regions in protein structures and applied it to a large set of globular enzymes with known experimental structure and catalytic sites. Our analysis showed a striking oscillatory pattern of free energy compensation centered on the catalytic region. Indeed, catalytic residues are usually nonoptimal with respect to stability, but residues in the first shell around the catalytic site are, on the average, stability strengths and thus compensate for this lack of stability; residues in the second shell are weaker again, and so on. This trend is consistent across all enzyme families. It is accompanied by a similar, but less pronounced, pattern of residue conservation across evolution. In addition, we analyzed cold- and heat-adapted enzymes separately and highlighted different patterns of stability strengths and weaknesses, which provide insight into the longstanding problem of catalytic rate enhancement in cold environments. The successful comparison of our stability and conservation results with experimental fitness data, obtained by deep mutagenesis scanning, led us to propose criteria for improving catalytic activity while maintaining enzyme stability, a key goal in enzyme design.


Assuntos
Estabilidade Enzimática , Estabilidade Proteica , Domínio Catalítico , Entropia , Catálise
5.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37225420

RESUMO

Enzymatic reactions are crucial to explore the mechanistic function of metabolites and proteins in cellular processes and to understand the etiology of diseases. The increasing number of interconnected metabolic reactions allows the development of in silico deep learning-based methods to discover new enzymatic reaction links between metabolites and proteins to further expand the landscape of existing metabolite-protein interactome. Computational approaches to predict the enzymatic reaction link by metabolite-protein interaction (MPI) prediction are still very limited. In this study, we developed a Variational Graph Autoencoders (VGAE)-based framework to predict MPI in genome-scale heterogeneous enzymatic reaction networks across ten organisms. By incorporating molecular features of metabolites and proteins as well as neighboring information in the MPI networks, our MPI-VGAE predictor achieved the best predictive performance compared to other machine learning methods. Moreover, when applying the MPI-VGAE framework to reconstruct hundreds of metabolic pathways, functional enzymatic reaction networks and a metabolite-metabolite interaction network, our method showed the most robust performance among all scenarios. To the best of our knowledge, this is the first MPI predictor by VGAE for enzymatic reaction link prediction. Furthermore, we implemented the MPI-VGAE framework to reconstruct the disease-specific MPI network based on the disrupted metabolites and proteins in Alzheimer's disease and colorectal cancer, respectively. A substantial number of novel enzymatic reaction links were identified. We further validated and explored the interactions of these enzymatic reactions using molecular docking. These results highlight the potential of the MPI-VGAE framework for the discovery of novel disease-related enzymatic reactions and facilitate the study of the disrupted metabolisms in diseases.


Assuntos
Aprendizado de Máquina , Redes e Vias Metabólicas , Simulação de Acoplamento Molecular , Fenômenos Fisiológicos Celulares
6.
bioRxiv ; 2023 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-36945484

RESUMO

Background: Enzymatic reaction networks are crucial to explore the mechanistic function of metabolites and proteins in biological systems and understanding the etiology of diseases and potential target for drug discovery. The increasing number of metabolic reactions allows the development of deep learning-based methods to discover new enzymatic reactions, which will expand the landscape of existing enzymatic reaction networks to investigate the disrupted metabolisms in diseases. Results: In this study, we propose the MPI-VGAE framework to predict metabolite-protein interactions (MPI) in a genome-scale heterogeneous enzymatic reaction network across ten organisms with thousands of enzymatic reactions. We improved the Variational Graph Autoencoders (VGAE) model to incorporate both molecular features of metabolites and proteins as well as neighboring features to achieve the best predictive performance of MPI. The MPI-VGAE framework showed robust performance in the reconstruction of hundreds of metabolic pathways and five functional enzymatic reaction networks. The MPI-VGAE framework was also applied to a homogenous metabolic reaction network and achieved as high performance as other state-of-art methods. Furthermore, the MPI-VGAE framework could be implemented to reconstruct the disease-specific MPI network based on hundreds of disrupted metabolites and proteins in Alzheimer's disease and colorectal cancer, respectively. A substantial number of new potential enzymatic reactions were predicted and validated by molecular docking. These results highlight the potential of the MPI-VGAE framework for the discovery of novel disease-related enzymatic reactions and drug targets in real-world applications. Data availability and implementation: The MPI-VGAE framework and datasets are publicly accessible on GitHub https://github.com/mmetalab/mpi-vgae . Author Biographies: Cheng Wang received his Ph.D. in Chemistry from The Ohio State Univesity, USA. He is currently a Assistant Professor in School of Public Health at Shandong University, China. His research interests include bioinformatics, machine learning-based approach with applications to biomedical networks. Chuang Yuan is a research assistant at Shandong University. He obtained the MS degree in Biology at the University of Science and Technology of China. His research interests include biochemistry & molecular biology, cell biology, biomedicine, bioinformatics, and computational biology. Yahui Wang is a PhD student in Department of Chemistry at Washington University in St. Louis. Her research interests include biochemistry, mass spectrometry-based metabolomics, and cancer metabolism. Ranran Chen is a master graduate student in School of Public Health at University of Shandong, China. Yuying Shi is a master graduate student in School of Public Health at University of Shandong, China. Gary J. Patti is the Michael and Tana Powell Professor at Washington University in St. Louis, where he holds appointments in the Department of Chemisrty and the Department of Medicine. He is also the Senior Director of the Center for Metabolomics and Isotope Tracing at Washington University. His research interests include metabolomics, bioinformatics, high-throughput mass spectrometry, environmental health, cancer, and aging. Leyi Wei received his Ph.D. in Computer Science from Xiamen University, China. He is currently a Professor in School of Software at Shandong University, China. His research interests include machine learning and its applications to bioinformatics. Qingzhen Hou received his Ph.D. in the Centre for Integrative Bioinformatics VU (IBIVU) from Vrije Universiteit Amsterdam, the Netherlands. Since 2020, He has serveved as the head of Bioinformatics Center in National Institute of Health Data Science of China and Assistant Professor in School of Public Health, Shandong University, China. His areas of research are bioinformatics and computational biophysics. Key points: Genome-scale heterogeneous networks of metabolite-protein interaction (MPI) based on thousands of enzymatic reactions across ten organisms were constructed semi-automatically.An enzymatic reaction prediction method called Metabolite-Protein Interaction Variational Graph Autoencoders (MPI-VGAE) was developed and optimized to achieve higher performance compared with existing machine learning methods by using both molecular features of metabolites and proteins.MPI-VGAE is broadly useful for applications involving the reconstruction of metabolic pathways, functional enzymatic reaction networks, and homogenous networks (e.g., metabolic reaction networks).By implementing MPI-VGAE to Alzheimer's disease and colorectal cancer, we obtained several novel disease-related protein-metabolite reactions with biological meanings. Moreover, we further investigated the reasonable binding details of protein-metabolite interactions using molecular docking approaches which provided useful information for disease mechanism and drug design.

8.
PLoS Comput Biol ; 18(12): e1010669, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36454728

RESUMO

The ubiquitous availability of genome sequencing data explains the popularity of machine learning-based methods for the prediction of protein properties from their amino acid sequences. Over the years, while revising our own work, reading submitted manuscripts as well as published papers, we have noticed several recurring issues, which make some reported findings hard to understand and replicate. We suspect this may be due to biologists being unfamiliar with machine learning methodology, or conversely, machine learning experts may miss some of the knowledge needed to correctly apply their methods to proteins. Here, we aim to bridge this gap for developers of such methods. The most striking issues are linked to a lack of clarity: how were annotations of interest obtained; which benchmark metrics were used; how are positives and negatives defined. Others relate to a lack of rigor: If you sneak in structural information, your method is not sequence-based; if you compare your own model to "state-of-the-art," take the best methods; if you want to conclude that some method is better than another, obtain a significance estimate to support this claim. These, and other issues, we will cover in detail. These points may have seemed obvious to the authors during writing; however, they are not always clear-cut to the readers. We also expect many of these tips to hold for other machine learning-based applications in biology. Therefore, many computational biologists who develop methods in this particular subject will benefit from a concise overview of what to avoid and what to do instead.


Assuntos
Benchmarking , Aprendizado de Máquina , Sequência de Aminoácidos , Mapeamento Cromossômico , Conhecimento
9.
Comput Struct Biotechnol J ; 20: 434-442, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35070166

RESUMO

Over the past decade, metagenomic sequencing approaches have been providing an ever-increasing amount of protein sequence data at an astonishing rate. These constitute an invaluable source of information which has been exploited in various research fields such as the study of the role of the gut microbiota in human diseases and aging. However, only a small fraction of all metagenomic sequences collected have been functionally or structurally characterized, leaving much of them completely unexplored. Here, we review how this information has been used in protein structure prediction and protein discovery. We begin by presenting some widely used metagenomic databases and analyze in detail how metagenomic data has contributed to the impressive improvement in the accuracy of structure prediction methods in recent years. We then examine how metagenomic information can be exploited to annotate protein sequences. More specifically, we focus on the role of metagenomes in the discovery of enzymes and new CRISPR-Cas systems, and in the identification of antibiotic resistance genes. With this review, we provide an overview of how metagenomic data is currently revolutionizing our understanding of protein science.

10.
Vaccines (Basel) ; 9(12)2021 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-34960221

RESUMO

Since 2019, the COVID-19 pandemic has resulted in sickness, hospitalizations, and deaths of the old and young and impacted global social and economy activities. Vaccination is one of the most important and efficient ways to protect against the COVID-19 virus. In a review of the literature on parents' decisions to vaccinate their children, we found that widespread vaccination was hampered by vaccine hesitancy, especially for children who play an important role in the coronavirus transmission in both family and school. To analyze parent vaccination decision-making for children, our review of the literature on parent attitudes to vaccinating children, identified the objective and subjective influencing factors in their vaccination decision. We found that the median rate of parents vaccinating their children against COVID-19 was 59.3% (IQR 48.60~73.90%). The factors influencing parents' attitudes towards child vaccination were heterogeneous, reflecting country-specific factors, but also displaying some similar trends across countries, such as the education level of parents. The leading reason in the child vaccination decision was to protect children, family and others; and the fear of side effects and safety was the most important reason in not vaccinating children. Our study informs government and health officials about appropriate vaccination policies and measures to improve the vaccination rate of children and makes specific recommendations on enhancing child vaccinate rates.

11.
Bioinformatics ; 37(20): 3421-3427, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-33974039

RESUMO

MOTIVATION: Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen's epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. RESULTS: We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Bioinformatics ; 37(14): 1963­1971, 2021 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-33471089

RESUMO

MOTIVATION: Although structured proteins adopt their lowest free energy conformation in physiological conditions, the individual residues are generally not in their lowest free energy conformation. Residues that are stability weaknesses are often involved in functional regions, whereas stability strengths ensure local structural stability. The detection of strengths and weaknesses provides key information to guide protein engineering experiments aiming to modulate folding and various functional processes. RESULTS: We developed the SWOTein predictor which identifies strong and weak residues in proteins on the basis of three types of statistical energy functions describing local interactions along the chain, hydrophobic forces and tertiary interactions. The large-scale analysis of the different types of strengths and weaknesses demonstrated their complementarity and the enhancement of the information they provide. Moreover, a good average correlation was observed between predicted and experimental strengths and weaknesses obtained from native hydrogen exchange data. SWOTein application to three test cases further showed its suitability to predict and interpret strong and weak residues in the context of folding, conformational changes and protein-protein binding. In summary, SWOTein is both fast and accurate and can be applied at small and large scale to analyze and modulate folding and molecular recognition processes. AVAILABILITY: The SWOTein webserver provides the list of predicted strengths and weaknesses and a protein structure visualization tool that facilitates the interpretation of the predictions. It is freely available for academic use at http://babylone.ulb.ac.be/SWOTein/.

13.
Bioinformatics ; 36(5): 1445-1452, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31603466

RESUMO

MOTIVATION: The solubility of a protein is often decisive for its proper functioning. Lack of solubility is a major bottleneck in high-throughput structural genomic studies and in high-concentration protein production, and the formation of protein aggregates causes a wide variety of diseases. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools. RESULTS: We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue-residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of Escherichia coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of almost 0.7 both in cross-validation on the training dataset and in an independent set of Saccharomyces cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists. AVAILABILITY AND IMPLEMENTATION: The SOLart webserver is freely available at http://babylone.ulb.ac.be/SOLART/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas de Escherichia coli , Solubilidade , Solventes
14.
Sci Rep ; 9(1): 12043, 2019 08 19.
Artigo em Inglês | MEDLINE | ID: mdl-31427701

RESUMO

Transmembrane proteins play a fundamental role in a wide series of biological processes but, despite their importance, they are less studied than globular proteins, essentially because their embedding in lipid membranes hampers their experimental characterization. In this paper, we improved our understanding of their structural stability through the development of new knowledge-based energy functions describing amino acid pair interactions that prevail in the transmembrane and extramembrane regions of membrane proteins. The comparison of these potentials and those derived from globular proteins yields an objective view of the relative strength of amino acid interactions in the different protein environments, and their role in protein stabilization. Separate potentials were also derived from α-helical and ß-barrel transmembrane regions to investigate possible dissimilarities. We found that, in extramembrane regions, hydrophobic residues are less frequent but interactions between aromatic and aliphatic amino acids as well as aromatic-sulfur interactions contribute more to stability. In transmembrane regions, polar residues are less abundant but interactions between residues of equal or opposite charges or non-charged polar residues as well as anion-π interactions appear stronger. This shows indirectly the preference of the water and lipid molecules to interact with polar and hydrophobic residues, respectively. We applied these new energy functions to predict whether a residue is located in the trans- or extramembrane region, and obtained an AUC score of 83% in cross validation, which demonstrates their accuracy. As their application is, moreover, extremely fast, they are optimal instruments for membrane protein design and large-scale investigations of membrane protein stability.


Assuntos
Aminoácidos/química , Biologia Computacional , Proteínas de Membrana/química , Modelos Moleculares , Algoritmos , Biologia Computacional/métodos , Interações Hidrofóbicas e Hidrofílicas , Conformação Proteica , Sais/química , Eletricidade Estática , Relação Estrutura-Atividade
15.
Bioinformatics ; 35(22): 4794-4796, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31116381

RESUMO

MOTIVATION: Interpretation of ubiquitous protein sequence data has become a bottleneck in biomolecular research, due to a lack of structural and other experimental annotation data for these proteins. Prediction of protein interaction sites from sequence may be a viable substitute. We therefore recently developed a sequence-based random forest method for protein-protein interface prediction, which yielded a significantly increased performance than other methods on both homomeric and heteromeric protein-protein interactions. Here, we present a webserver that implements this method efficiently. RESULTS: With the aim of accelerating our previous approach, we obtained sequence conservation profiles by re-mastering the alignment of homologous sequences found by PSI-BLAST. This yielded a more than 10-fold speedup and at least the same accuracy, as reported previously for our method; these results allowed us to offer the method as a webserver. The web-server interface is targeted to the non-expert user. The input is simply a sequence of the protein of interest, and the output a table with scores indicating the likelihood of having an interaction interface at a certain position. As the method is sequence-based and not sensitive to the type of protein interaction, we expect this webserver to be of interest to many biological researchers in academia and in industry. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets are available at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Algoritmos , Sequência de Aminoácidos , Proteínas , Análise de Sequência de Proteína
16.
Sci Rep ; 8(1): 14661, 2018 10 02.
Artigo em Inglês | MEDLINE | ID: mdl-30279585

RESUMO

The solubility of globular proteins is a basic biophysical property that is usually a prerequisite for their functioning. In this study, we probed the solubility of globular proteins with the help of the statistical potential formalism, in view of objectifying the connection of solubility with structural and energetic properties and of the solubility-dependence of specific amino acid interactions. We started by setting up two independent datasets containing either soluble or aggregation-prone proteins with known structures. From these two datasets, we computed solubility-dependent distance potentials that are by construction biased towards the solubility of the proteins from which they are derived. Their analysis showed the clear preference of amino acid interactions such as Lys-containing salt bridges and aliphatic interactions to promote protein solubility, whereas others such as aromatic, His-π, cation-π, amino-π and anion-π interactions rather tend to reduce it. These results indicate that interactions involving delocalized π-electrons favor aggregation, unlike those involving no (or few) dispersion forces. Furthermore, using our potentials derived from either highly or weakly soluble proteins to compute protein folding free energies, we found that the difference between these two energies correlates better with solubility than other properties analyzed before such as protein length, isoelectric point and aliphatic index. This is, to the best of our knowledge, the first comprehensive in silico study of the impact of residue-residue interactions on protein solubility properties.The results of this analysis provide new insights that will facilitate future rational protein design applications aimed at modulating the solubility of targeted proteins.


Assuntos
Aminoácidos/química , Modelos Químicos , Proteínas/química , Sequência de Aminoácidos , Ânions/química , Cátions/química , Simulação por Computador , Conjuntos de Dados como Assunto , Interações Hidrofóbicas e Hidrofílicas , Conformação Proteica , Solubilidade , Termodinâmica
17.
Bioinformatics ; 33(10): 1479-1487, 2017 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-28073761

RESUMO

MOTIVATION: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in protein-protein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction. In this paper, we evaluate the importance of various features using Random Forest (RF), and include as a novel feature backbone flexibility predicted from sequences to further optimise protein interface prediction. RESULTS: We observe that there is no single sequence feature that enables pinpointing interacting sites in our Random Forest models. However, combining different properties does increase the performance of interface prediction. Our homomeric-trained RF interface predictor is able to distinguish interface from non-interface residues with an area under the ROC curve of 0.72 in a homomeric test-set. The heteromeric-trained RF interface predictor performs better than existing predictors on a independent heteromeric test-set. We trained a more general predictor on the combined homomeric and heteromeric dataset, and show that in addition to predicting homomeric interfaces, it is also able to pinpoint interface residues in heterodimers. This suggests that our random forest model and the features included capture common properties of both homodimer and heterodimer interfaces. AVAILABILITY AND IMPLEMENTATION: The predictors and test datasets used in our analyses are freely available ( http://www.ibi.vu.nl/downloads/RF_PPI/ ). CONTACT: k.a.feenstra@vu.nl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Modelos Estatísticos , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Multimerização Proteica , Biologia Computacional/métodos , Curva ROC , Análise de Sequência de Proteína/métodos
18.
PLoS One ; 11(5): e0155251, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27166787

RESUMO

Large-scale identification of native binding orientations is crucial for understanding the role of protein-protein interactions in their biological context. Measuring binding free energy is the method of choice to estimate binding strength and reveal the relevance of particular conformations in which proteins interact. In a recent study, we successfully applied coarse-grained molecular dynamics simulations to measure binding free energy for two protein complexes with similar accuracy to full-atomistic simulation, but 500-fold less time consuming. Here, we investigate the efficacy of this approach as a scoring method to identify stable binding conformations from thousands of docking decoys produced by protein docking programs. To test our method, we first applied it to calculate binding free energies of all protein conformations in a CAPRI (Critical Assessment of PRedicted Interactions) benchmark dataset, which included over 19000 protein docking solutions for 15 benchmark targets. Based on the binding free energies, we ranked all docking solutions to select the near-native binding modes under the assumption that the native-solutions have lowest binding free energies. In our top 100 ranked structures, for the 'easy' targets that have many near-native conformations, we obtain a strong enrichment of acceptable or better quality structures; for the 'hard' targets without near-native decoys, our method is still able to retain structures which have native binding contacts. Moreover, in our top 10 selections, CLUB-MARTINI shows a comparable performance when compared with other state-of-the-art docking scoring functions. As a proof of concept, CLUB-MARTINI performs remarkably well for many targets and is able to pinpoint near-native binding modes in the top selections. To the best of our knowledge, this is the first time interaction free energy calculated from MD simulations have been used to rank docking solutions at a large scale.


Assuntos
Algoritmos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Cinética , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Termodinâmica
19.
BMC Bioinformatics ; 16: 325, 2015 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-26449222

RESUMO

BACKGROUND: Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. RESULTS: We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9% recall and up to 25.1% precision. CONCLUSIONS: To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites.


Assuntos
Proteínas/química , Área Sob a Curva , Dimerização , Domínios e Motivos de Interação entre Proteínas , Proteínas/metabolismo , Curva ROC
20.
Nan Fang Yi Ke Da Xue Xue Bao ; 33(10): 1508-11, 2013 Oct.
Artigo em Chinês | MEDLINE | ID: mdl-24144757

RESUMO

OBJECTIVE: To analyze the association between T393C single nucleotide polymorphism (SNP) of GNAS1 gene and non-valvular atrial fibrillation (AF) in Chinese Han patients. METHODS: Ninety patients with non-valvular AF and 90 healthy subjects were examined for T393C SNP of GNAS1 gene using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP). The allele genotypes and the distribution of allele frequencies were analyzed and compared between the two groups. The relationship between allele frequency distribution characteristics and the heart rate variability (HRV) were also studied for analysis of the association between T393C SNP of GNAS1 gene and the autonomic nervous activation in non-valvular AF. RESULTS: The two groups showed a significant difference in the frequencies of genotypes of T393C SNP of GNAS1 gene and allele frequencies (P<0.01). CC genotype and T393C allele frequency were significantly increased in the case group. pNN50, LF, or LF/HF showed no significant difference between different genotypes (P<0.05). CONCLUTIONS: The T393C SNP of GNAS1 gene is closely associated with non-valvular AF in Chinese Han patients.


Assuntos
Fibrilação Atrial/genética , Subunidades alfa Gs de Proteínas de Ligação ao GTP/genética , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Povo Asiático/genética , Fibrilação Atrial/metabolismo , Fibrilação Atrial/fisiopatologia , Cromograninas , Feminino , Frequência do Gene , Genótipo , Frequência Cardíaca , Humanos , Masculino , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase , Polimorfismo de Fragmento de Restrição , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA